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PACKET  VOICE 


Digital  packet  voice  has  been  the  topic  of  a  great  deal  of  research  and 
development  work  at  the  Information  Sciences  Institute  (ISI)  and  other  research 
institutions  in  the  DARPA  (Defense  Advanced  Research  Projects  Agency)  community 
since  the  mid  1970s.  DARPA  has  provided  funding  for  the  development  of  this 
technology,  which  offers  the  advantages  of  dynamic  routing,  excellent  transmission 
quality,  and  a  mixture  of  voice  and  data  on  the  same  network.  It  also  provides  an 
inherent  mechanism  for  efficient  channel  utilization  by  the  use  of  sound/silence 
detection. 

Packet  speech  was  first  transmitted  over  the  terrestrial  ARPANET.  This  work 
proved  that  the  concept  of  transmitting  digital  voice  signals  via  packet  networks  was 
valid.  Special  packet  network  voice  protocols  have  evolved  as  a  result  of  this  work, 
providing  a  more  efficient  transport  mechanism  for  voice  signals  in  a  packet 
environment. 

For  this  new  technology  to  be  fully  tested,  it  must  be  made  accessible  to  a  large 
group  of  people  and  put  into  use  on  a  regular  basis.  For  this  purpose,  DARPA  and  the 
Defense  Communications  Agency  are  sponsoring  the  development  of  a  3  Mb/s 
wideband  packet  satellite  network  to  serve  as  a  voice  and  data  link  between  sites  in 
several  locations  around  the  United  States,  allowing  packet  voice  technology  to  be 
demonstrated  and  evaluated  on  a  scale  closer  to  that  of  a  real-world  application. 

To  expand  the  usefulness  and  flexibility  of  the  new  wideband  network  as  a  packet 
voice  communication  system,  it  is  desirable  to  provide  connections  to  the  commercial 
switched  telephone  network  (STN)  at  several  geographically  separated  sites.  The 
STNI  (Switched  Telephone  Network  Interface)  card  [5]  pictured  in  Figure  i  was 
developed  at  ISI  to  provide  this  connection. 

This,  paper  describes  the  implementation  of  the  STNI  card  and  discusses  various 
aspects  of  its  development,  including  interface  requirements,  sound/silence 
discrimination,  user  tone  signalling,  telephone  system  tone  signalling,  and  disconnect 
detection. 


STNI  CARD 

The  STNI  card  is  designed  to  provide  an  interface  between  the  commercial 
telephone  network  and  a  Packet  Voice  Terminal  [7],  developed  by  the  M.l.T.  Lincoln 
Laboratory.  Figure  2  shows  a  typical  scenario  of  a  call  in  which  the  STNI  is  involved. 
The  voice  terminal  handles  the  packet  network  protocol,  providing  a  digital 
connection  to  the  network.  The  STNI  card  is  used  to  answer  calls  from  the  telephone 
system,  present  a  dial  tone,  and  accept  digits  from  the  caller  directing  a  call  to  be 
placed  in  the  digital  packet  voice  network.  Calls  to  the  switched  telephone  network 
may  also  originate  in  the  packet  network;  the  user  routes  them  to  an  STNI  card  and 
requests  it  to  dial  the  distant  telephone  number.  Once  a  call  is  in  progress,  the  STNI 
performs  the  analog/digital  and  digital/analog  conversions  between  the  telephone 
line  signals  and  network  data.  The  card  also  performs  sound/silence  detection  as  a 
means  of  bandwidth  optimization. 


Figure  1:  STNI  card 

The  STNI  card  consists  of  a  Z80  microprocessor  with  two  serial  and  two  parallel 
I/O  channels,  an  Intel  2910A  ju  =  255  pulse  code  modulation  (PCM)  CODEC  paired 
with  the  Intel  2912  PCM  line  filter,  a  DTMF  (Dual  Tone  Multiple  Frequency-  touch- 
tone)  decoder,  a  telephone  line  ring  detect  circuit,  and  a  line  control  relay  (Figure  3). 
Other  functions  which  might  have  been  implemented  in  hardware  are  instead 
performed  by  software  to  keep  the  circuit  compact.  These  include  sound  detection, 
echo  suppression,  disconnect  detection,  DTMF  dialing,  and  user  tone  feedback  (dial 
tone,  busy  signal,  etc.).  ASCII  commands  sent  by  the  voice  terminal  via  a  2400  b/s 
RS-232  serial  line  instruct  the  STNI  to  pick  up  the  telephone  line,  dial  out.  hang  up, 
answer  the  ringing  line,  and  play  tones,  The  same  two  way  serial  line  is  used  in  the 
other  direction  by  the  STNI  to  transmit  ASCII  status  messages  back  to  the  voice 
terminal,  indicating  a  ring-detected  condition  or  received  touch-tone  digits.  This 
ASCII  command  port  is  also  usable  with  a  standard  serial  ASCII  terminal  and  provides 
an  excellent  debugging  aid. 
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The  arrows  show  the  path  of  e  typical  call  originating  at  any  telephone,  dialing  into  an 
STNI  card,  transmitting  up  across  the  wideband  network,  down  to  an  ST Nl  at  another  network 
site,  and  on  to  the  distant  telephone. 

Figure  2:  Typical  use  of  STNI  card 


SOUND  DETECTION 

Perhaps  the  most  critical  function  of  the  STNI  card,  as  it  interfaces  to  analog 
signals,  is  sound  detection.  Bandwidth  compression  is  accomplished  by  taking 
advantage  of  silent  periods  during  speech.  To  provide  maximum  packet  network 
efficiency  without  interfering  with  the  clarity  of  the  transmission,  the  sound  detector 
must  be  accurate  and  fast. 

Sound  detection  in  the  STNI  is  now  accomplished  as  a  software  function  of  the 
Z80  microprocessor,  which  analyzes  speech  data  in  parcels,  each  180  bytes  long, 
representing  22.5  msec,  of  speech.  Statistics  are  maintained  on  the  data  and  used  by 
the  Z80  program  to  determine,  on  a  parcel  by  parcel  basis,  whether  or  not  a  given 
parcel  is  silent.  Several  techniques  and  refinements,  described  below,  were 
implemented  and  evaluated  before  a  suitable  method  was  finally  developed. 
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Figure  3:  STNI  functional  diagram 


O  SD- Based  Sound  Detector 

Initial  plans  were  to  perform  sound  detection  using  a  continuously  variable  slope 
delta  modulation  (CVSD)  encoder  on  the  STNI  card.  Analysis  of  CVSD  signal  patterns 
should  provide  a  reasonable  basis  for  differentiating  sound  from  silence.  This 
technique  has  been  applied  in  the  past  with  good  results  [2]. 

It  was  later  found  that  the  sound  detection  function  could  be  performed  using  the 
PCM  data,  thus  eliminating  the  need  for  CVSD  hardware  and  saving  space  on  the 
STNI  card.  As  a  result,  the  CVSD-based  detector  was  never  fully  implemented  on  the 
STNI;  instead,  the  hardware  was  removed  from  the  prototype  design. 


vox 


in  one  common  sound  detector,  usually  referred  to  as  a  VOX  or  Voice  Operated 
Switch,  a  fixed  threshold  and  a  set  of  delays  provide  wide-range  sound  detection. 
Sound  with  amplitude  (a)  in  excess  of  a  fixed  threshold  (t)  for  a  period  in  excess  of  a 
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fixed  delay  (of)  trips  the  detector,  which  is  then  sustained  for  a  preset  hangover  time 
(s).  after  which  it  resets.  If  a  exceeds  t  during  the  period  of  s.  s  is  reset  to  maximum 
and  thus  the  detector  remains  tripped. 

Among  the  problems  with  such  a  sound  detector  are  the  following. 

•  It  is  slow  to  respond,  often  resulting  in  the  loss  of  several  syllables  at  the 
beginning  of  a  burst  of  speech. 

•  It  sustains  longer  than  is  desirable. 

•  It  does  not  adapt  to  ambient  noise  conditions. 

A  VOX  circuit  can  be  adjusted  to  respond  faster  at  the  risk  of  triggering  on  noise 
pulses  and  clipping  soft  syllables.  VO Xs  typically  do  not  react  very  fast,  thus 
consuming  excessive  bandwidth,  They  also  do  not  adapt  to  varying  background  noise 
levels.  This  behavior  results  in  an  unnecessarily  heavy  load  on  a  packet  network. 

VOX-type  detectors  are  suitable  for  speaker-phones,  voice  activated  radio 
transmitters,  and  other  applications:  however,  for  packet  voice  or  any  speech 
multiplexing  system  in  which  speed  and  accuracy  are  important,  their  performance  is 
inadequate. 


Bell  System  Sound  Detector  for  I ASI  Systems 

The  sound  detection  approach  to  bandwidth  compression  is  not  a  new  idea:  it 
has  been  in  use  for  many  years  in  the  commercial  telephone  network  Many  specific 
implementations  have  been  developed  for  this  purpose. 

One  such  sound  detect.on  technique  used  was  engineered  in  the  early  1960s  by 
the  Bell  System,  for  use  in  the  Time  Assignment  Speech  interpolation  (T ASI)  system  of 
trunk  usage  optimization  [1 . 6].  This  system  is  divided  into  two  major  components,  a 
speech  detectcr  and  a  computer  analys's  program 

The  speech  detector  consists  of  a  level  detection  device  and  a  flip-flop  with  its 
reset  line  clocned  at  200  Hz.  If  the  sound  threshold  is  exceeded  during  the  interval 
between  resets,  the  flip-flop  is  set  indicating  sound  detected  withm  that  frame; 
otherwise,  the  flip-flop  will  remain  in  the  off  state  indicating  that  nothing  was  "heard." 
The  resulting  sound/ silence  status  information  is  then  passed  to  the  computer 
program  to  smooth  out  the  spurt-gap  pattern,  eliminating  intersyllable  gaps  and  thus 
avoiding  chopping  off  the  first  parts  of  words,  caused  by  switching  into  silence  just  as 
a  word  is  beginning. 

Each  spurt  or  gap  has  a  duration  of  an  integral  multiple  of  the  5  msec  frame  time. 
All  spurts  shorter  than  some  predetermined  throw-away  time  are  discarded  to 
eliminate  spurious  background  noise.  Gaps  in  the  remaining  stream  that  are  shorter 
than  a  given  fill-in  time  are  ignored  and  considered  to  be  part  of  a  talhspurt. 

With  this  technique,  three  parameters  must  be  adjusted:  threshold  for  the  speech 
detector,  fill-in  time,  and  throw  away  time. 

1.  Threshold  level,  above  which  signals  are  considered  sound  and  below 
which  they  are  silence,  was  selected  by  investigating  data  from  analysis  of 
real  speech  and  by  trial  and  error.  If  this  level  were  too  high,  syllables 
would  frequently  be  clipped:  if  it  were  too  low.  the  "TASI  advantage" 
would  be  significantly  reduced 
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2.  Fill-in  time  is  the  maximum  length  of  a  silence  period  (gap)  to  be  ignored. 

If  sound  were  present,  stopped  for  a  time  less  than  or  equal  to  the  fill-in 
time,  and  then  resumed,  no  silence  would  be  recognized.  If,  however,  the 
silence  period  exceeded  the  fill-in  time,  the  entire  period  would  be 
considered  silent.  If  the  fill-in  time  were  too  long,  valuable  bandwidth 
would  be  wasted;  if  it  were  too  short,  syllables  would  be  lost. 

3.  Throw-away  time  prevents  short  spikes  of  noise  from  tripping  the 
detector.  This  is  the  maximum  length  of  a  burst  of  sound  that  can  be 
totally  ignored.  The  sole  function  of  throw-away  time  is  to  conserve 
bandwidth.  In  the  absence  of  this  parameter,  the  user  would  not  be 
noticeably  affected.  The  bandwidth  usage,  however,  would  increase 
substantially. 

The  Bell  System  method  has  been  tested,  its  parameters  have  been  fine-tuned, 
and  it  is  now  in  wide  use  throughout  the  telephone  network.  The  scheme  is  fast  and 
reliable- -a  definite  improvement  over  VOX-type  detectors.  Due  to  its  fixed  threshold, 
however,  it  does  not  adapt  to  variable  background  noise  conditions;  this  rigidity  can 
cause  less  than  optimal  bandwidth  usage  in  medium-  to  high-noise  environments. 
Also,  excessively  quiet  sounds  such  as  whispering  will  be  degraded. 


A  Dynamic  Sound  Detector 

Human  speech,  due  to  its  nonstationary  characteristics,  is  easily  distinguished 
from  the  relatively  stationary  pattern  of  background  noise.  Noise  tends  to  have 
variations  in  amplitude  significantly  smaller  than  those  of  speech,  occurring  relatively 
slowly.  By  extending  the  same  concepts  behind  the  fixed  threshold  detector,  dynamic 
sound  detectors  [3]  can  therefore  be  constructed  which  learn  the  ambient 
background  noise  level  and  vary  the  threshold,  continuously  adapting  it  for  optimum 
noise  reduction. 

By  analysis  of  short  term  variations  in  the  amplitude  of  the  channel  signal,  noise 
is  differentiated  from  speech  patterns.  Continuous  averaging  of  the  noise  levels  over 
a  short  time  provides  a  reasonable  basis  for  adjustment  of  the  sound  threshold.  Fill-in 
and  throw-away  times  as  applied  in  the  TASI  sound  detector  are  used  to  further 
smooth  out  the  pattern,  providing  an  accurate  indication  of  the  sound/silence  status. 
Such  a  dynamic  detector,  which  can  closely  follow  the  actual  speech  patterns,  is  ideal 
for  the  packet  voice  application. 


STN1  Dynamic  Sound  Detector 

The  goal  for  the  STNI  application  was  to  implement  a  sound  detector  that  would 
adapt  to  the  ambient  noise  environment,  react  sufficiently  quickly  so  as  not  to  clip 
syllables,  and  decay  quickly,  to  conserve  as  much  of  the  available  bandwidth  as 
possible.  In  addition,  it  was  desired  that  the  sound  detection  be  accomplished  without 
adding  to  the  hardware  complexity. 

The  result  was  a  completely  software-driven  dynamic-threshold  sound  detection 
system,  developed  specifically  for  the  STNI.  It  can  directly  analyze  raw  fj.  =  255  PCM 
data,  detect  sound,  and  control  the  speech  flow  in  real  time,  within  the  limitations  of  a 
Z80  microprocessor. 


7 


Once  per  sample: 

avg  =  abs ( x )/256  +  ( 255*avg ) /256 ; 

Once  per  parcel: 

thresh  =  minavg  +  8; 

if  (avg  >  thresh) 

{sound  =  TRUE; 
count  -  count  -  1; 
if  (count  "  0) 

{minavg  =  minavg  +  1; 
count  *  16; 

} 

} 

else  {sound  *  FALSE; 

if  (avg  <  minavg) 

{minavg  *  avg; 
count  =  16; 

> 

} 

Figure  4:  STNI  software  sound  detection  algorithm 

The  STNI  sound  detection  algorithm  is  shown  in  Figure  4.  The  algorithm  consists 
of  two  paths:  the  averaging  of  P CM  amplitude  values  (avg),  performed  each  sample 
time,  and  the  threshold  adaptation  and  sound/silence  decision,  executed  at  the  end 
of  each  parcel. 

In  this  algorithm,  the  threshold  ( thresh )  always  floats  at  8  units  above  the 
minimum  average  (minavg).  Parcels  averaging  in  excess  of  thresh  are  passed  as 
sound:  otherwise  they  are  flagged  as  silence  and  are  not  sent  over  the  packet 
network.  When  the  average  amplitude  level  (avg)  remains  above  thresh  for  T6  parcel 
times,  minavg  is  incremented,  thus  increasing  thresh  by  one  as  well.  Whenever  avg 
drops  below  minavg.  minavg  is  immediately  set  equal  to  avg.  adjusting  for  sudden 
drops  in  the  amplitude.  Figure  5  shows  the  performance  of  this  detector  on  voice 
signals  and  background  noise. 

The  threshold  requires  a  relatively  long  time  to  climb  high  enough  to  block  out  a 
loud  background  noise  level.  The  sudden  absence  of  the  loud  noise,  however,  will  be 
compensated  for  immediately.  This  effect  is  shown  in  Figure  6. 

Speech  is  characteristically  perforated  by  momentary  periods  of  silence. 
Therefore,  the  threshold  should  never  have  time  to  climb  above  the  level  of  the 
speech.  Signals  other  than  speech  may  not  survive  as  well.  Continuous  sounds  such 
as  a  dial  tone  or  the  carrier  of  a  dataset,  for  example,  will  eventually  be  blocked  out 
completely.  As  a  result,  voice  band  data  modulation  will  not  function  over  this  system 
This  does  not  present  a  real  problem  since  the  packet-switched  digital  network  can 
handle  high-bandwidth  data  communication  directly,  without  voice  band  modulation. 

This  sound  detector  is  fast  enough  to  detect  silence  between  words  and 
sometimes  even  between  syllables.  It  does  not,  however,  clip  utterances  either  at  the 
beginning  or  the  end  of  the  talkspurt.  Its  dynamic  threshold  allows  it  to  adapt  quickly 
to  the  background  noise  level.  The  sound  from  an  extremely  noisy  environment,  such 
as  a  noisy  computer  room,  can  make  the  sound  detector  slightly  sluggish  because  of 
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In  inis  graph,  the  lower  ot  the  parallel  lines  corresponds  to  minavg  in  the  algorithm,  the 
higher  line  is  thresh  The  leftmost  segment  consists  ol  background  room  noise  and  wnat  little 
sound  seeped  through  the  wall  from  an  ad/acent  computer  room  Tne  second  segment  is 
speech  with  the  same  background  noise  level.  In  the  third  segment,  tne  door  to  the  adiaoent 
computer  room  was  opened  to  provide  a  loud  background  noise.  Note  that  the  silence 
threshold  climbed  to  eclipse  this  noise  level  completely  In  the  lowtr,  segment,  speech  was 
again  recorded,  this  time  with  the  computer  room  background  noise,  demonstrating  that  the 
detector  is  still  quite  effective  with  the  increased  noise  level.  The  last  spike  toward  the  end 
was  produced  by  the  slamming  ol  the  computer  room  door.  Note  the  sudden  decrease  in  the 
threshold  level  immediately  following  that  event. 

Figure  5:  STNI  sound  detector  performance  with  speech  and  background  noise 

the  nonlinear  response  of  the  PCM  conversion,  but  the  sound  detector  still  performs 
reasonably,  and  the  background  noise  is  completely  eliminated. 


ECHO  SUPPRESSION 

Echo  is  a  serious  problem  with  long-distance  communications.  Echo  problems 
are  a  result  of  the  conversion  between  the  two-wire  local  subscriber  loop  to  the  four- 
wire  telephone  handset.  In  long-distance  connections,  additional  conversions  are 
done  from  the  subscriber  loop  two-wire  circuit  to  the  long-distance  carrier,  which  is 
typically  a  four- wire  connection,  and  back  to  the  subscriber  loop  on  the  distant  end. 
In  a  local  connection,  the  time  between  the  initial  speech  and  the  echo  is  only  a  few 
microseconds,  sufficiently  short  that  it  is  not  apparent.  When  the  connection  spans 
several  thousand  miles,  however,  the  transmission  delays  are  measured  in 
milliseconds- -long  enough  to  be  sensed  by  the  human  ear.  When  the  signals  are 
transmitted  via  satellite,  the  44,000-mile  trip  takes  about  a  quarter  second  at  light 
speed  in  addition  to  any  delays  introduced  by  the  transmission  equipment  itself. 

In  the  commercial  long-distance  networks,  loss  over  long  analog  transmission 
lines  helps  reduce  echo  since  the  voice  travels  from  one  end  to  the  other,  but  the 
echo  is  transmitted  once  from  the  local  station  to  the  distant  receiver  and  then  back, 
doubling  its  effective  loss.  In  digital  systems,  however,  there  is  no  inherent  loss,  and 
the  echo  becomes  more  apparent.  To  take  advantage  of  this  effect,  about  6  dB  of  loss 
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Shown  here  is  the  response  o!  the  sound  detector  lo  a  constant,  loud  tone,  in  this  case  a 
touch-tone.  The  threshold  climbs  slowly,  eventually  passing  the  level  ot  the  loud  tone  Once 
it  has  reached  this  point,  it  ceases  to  climb,  stabilizing  at  a  level  just  above  the  amplitude  ot 
the  tone.  When  the  tone  was  removed  (not  shown  here/,  the  threshold  dropped  immediately 
bach  to  its  level  at  the  beginning  of  the  graph. 

Figure  6:  STNI  sound  detector  performance  with  high-level  tone 

has  been  introduced  into  the  analog  interface  of  the  STNI.  resulting  in  a  round-trip 
echo  loss  of  about  12  dB,  as  shown  in  Figure  7.  This  loss  helps  reduce  the  problem, 
but  is  not  sufficient,  so  an  echo  suppression  or  cancellation  device  must  also  be 
employed. 


6dB  Attenuation 


2-4 

Wire 

Hybrid 


Figure  7:  Loss  introduced  in  STNI  analog  interface 

Echo  suppression  attempts  to  block  echo  by  blocking  outgoing  signals  when 
incoming  signals  are  received.  Each  end  of  a  connection  performs  this  function  in  an 
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effort  to  prevent  echo  from  returning.  Some  echo  passes  in  the  transition  from  sound 
to  silence,  as  the  echo  suppressor  turns  off.  The  major  problem  with  this  technique, 
however,  is  its  inability  to  deal  with  both  ends  generating  signals  simultaneously. 
When  both  parties  on  the  connection  transmit  concurrently,  neither  will  be  heard.  This 
often  results  in  choppy  speech  if  one  party  attempts  to  interrupt  the  other.  In  a  system 
which  has  very  long  delays,  typical  of  satellite  transmission,  both  parties  often  try  to 
speak  at  the  same  time.  Neither  party  is  aware  of  this  until  the  long  delay  has  elapsed, 
at  which  point  both  stop  talking  and  each  awaits  the  other,  repeating  the  process  until 
one  waits  and  the  other  does  not. 

In  contrast,  echo  cancellation  [8]  allows  full-duplex  traffic,  providing  a  much 
more  comfortable  environment  for  conversation.  An  adaptive  echo  canceller 
automatically  synthesizes  a  filter  with  the  characteristics  of  the  echo  path,  processes 
the  incoming  speech  using  that  filter,  and  subtracts  the  resulting  estimated  echo  from 
the  actual  outgoing  signal.  As  a  result,  the  echo  is  eliminated  from  the  signal. 

Commercially  available  echo-cancellation  devices  are  expensive  and  often  larger 
than  the  entire  STNI  card.  The  possibility  of  using  a  digital  signal  processing  chip  to 
perform  echo  cancellation  is  being  investigated  as  part  of  the  project;  however,  until 
this  work  is  completed,  an  echo-suppression  algorithm  integrated  with  the  sound 
detector  is  in  use.  making  the  connection  essentially  half-du  .  lex.  not  unlike  long¬ 
distance  trunks  in  the  commercial  telephone  network  [4],  When  the  party  at  one  end 
of  a  conversation  is  speaking  into  the  system,  the  other  side  is  temporarily  blocked 
from  transmitting.  Loud  sounds  override  this  mechanism,  so  it  is  possible  to  interrupt 
the  speaking  party  by  speaking  loudly.  The  echo  suppression  is  accomplished  by 
increasing  the  minimum  sound  threshold  for  transmission  when  data  is  being 
received. 

DISCONNECT  DETECTION 

At  the  end  of  a  telephone  call,  when  the  caller  hangs  up.  the  connection  over  the 
telephone  network  is  broken.  If  a  call  is  in  progress  via  the  STNI  card  when  this 
happens,  this  call  should  be  disconnected  and  the  STNI  made  available  for  another 
call.  On  the  surface,  this  problem  may  appear  simple  to  handle,  but  due  to  the 
idiosyncrasies  of  the  various  telephone  switching  systems,  it  is  rather  difficult. 

Several  common  methods  exist  for  detecting  when  the  calling  telephone  line 
disconnects.  Computer  dial-in  modems  must  detect  a  disconnect  in  order  to 
relinquish  the  line  when  the  user  hangs  up.  In  a  moder  however,  carriers  are  always 
present  during  a  connection,  making  disconnection  detection  a  simple  matter  of 
noticing  the  absence  of  this  carrier  tone.  The  STNI,  however,  does  not  have  a 
continuous  carrier,  and  therefore  must  detect  other  indications. 

One  method  is  to  use  line  voltage  monitoring.  In  most  modern  central  office 
switching  systems,  the  line  voltage  momentarily  drops  to  zero  when  any  significant 
event,  such  as  disconnection,  occurs.  A  detector  need  merely  signal  the  processor 
when  this  zero-battery  condition  is  sensed.  Some  older  switching  systems,  and  many 
modern  digital  PBX  systems,  however,  do  not  provide  this  signal  and  thus  would  not 
be  compatible  with  this  detection  method. 
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Another  method  is  to  listen  for  a  dial  tone.  After  a  call  is  disconnected,  almost  all 
switching  systems  will  eventually  return  to  a  dial  tone  to  allow  another  call  to  be 
placed.  Dial  tones  vary  in  frequency  and  amplitude  from  system  to  system,  making 
absolute  dial  tone  detection  somewhat  difficult. 

Analysis  of  several  dial  tones  from  several  different  sources,  including  Bell 
System  #  1 A  ESS  and  GTE  EAX  central  office  switches  and  a  Stromberg  Carlson 
Crossreed  PBX  system,  has  shown  that  dial  tones  share  a  common  characteristic  in 
that  they  maintain  a  constant  amplitude  for  a  period  of  several  seconds.  Most 
systems,  however,  stop  playing  the  dial  tone  after  a  period  of  time  and  revert  to  a  loud 
error  tone,  in  case  the  user  accidentally  left  the  phone  off  the  hook.  The  dial  tone  on 
the  Stromberg  Carlson  Crossreed  PBX  system  was  by  far  the  shortest,  lasting  only 
about  eight  seconds. 


This  is  the  graphic  display  of  the  #  1A  ESS  dial  tone,  via  a  foreign  exchange  repeater 
circuit  and  roughly  5  to  6  miles  o I  wire.  It  is  shown  here  followed  by  a  period  of  silence  and 
the  off-hcox  "scream"  tone  This  was  the  longest  of  the  dial  tones  observed  about  16 
seconds. 

Figures:  Bell  System  #  1 A  ESS  dial  tone 

The  method  chosen  for  use  in  the  STNI  is  a  software  analysis  of  the  data  flowing 
from  the  telephone  side  of  the  interface.  It  is  assumed  that  it  is  highly  unlikely  for  the 
signal  amplitude  to  remain  stable  for  more  than  a  second  or  two  during  a  normal 
conversation.  Figure  5  is  a  representative  sample  of  amplitude  variation  in  normal 
speech.  The  use  of  data  modems  is  already  precluded  by  the  sound/silence 
detection  method,  so  this  assumption  poses  no  further  restriction. 

Eight  seconds  is  more  than  long  enough  to  recognize  the  stability  of  a  tone. 
Graphic  representations  of  the  collected  dial  tones  (Figures  8,  9,  and  10)  indicated 
that  the  average  amplitude--a  statistic  already  maintained  by  the  STNI- -varied  no  more 
than  about  ±5  PCM  sampling  units  during  the  time  they  were  played.  Using  this 
information,  an  algorithm  was  developed  that  tracks  the  average  amplitudes  over  a 
period  of  time  and  senses  relative  stability.  Eight  seconds  was  chosen  as  the 
validation  period,  since  this  is  the  minimum  length  of  any  dial  tone  studied. 
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Shown  here  is  the  dial  tone  collected  from  a  GTE  EAX  switching  system.  This  was 
collected  from  a  short  tail  ot  no  more  than  a  tew  thousand  teet  This  particular  switch  does 
not  provide  any  zero-battery  signalling  alter  its  dial  tone  times  out.  but  instead  switches 
directly  to  a  reorder  tone  which  lasts  indefinitely. 

Figure  9:  GTE  EAX  system  dial  tone 


This  sample  was  collected  from  a  Stromberg  Carlson  Crossreed  PBX  system.  This  was  by 
ter  the  shortest  dial  tone  collected,  lasting  only  about  eight  seconds,  then  terminating  in  an 
indefinite  reorder  tone  without  any  battery  signalling. 


Figure  1 0:  Stromberg  Carlson  Crossreed  PBX  system  dial  tone 

To  prevent  false  trips  caused  by  stable  background  noise,  a  minimum  level  is 
needed,  below  which  sounds  will  not  be  interpreted  as  dial  tones.  All  dial  tones 
studied  were  in  excess  of  64  PCM  sampling  units  of  amplitude.  Therefore,  the 
detector  has  been  set  up  to  terminate  a  call  on  any  tone  with  stable  amplitude  above 
the  level  of  64  PCM  sampling  units  for  eight  seconds. 
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Some  switching  systems  do  not  revert  to  a  dial  tone  at  the  end  of  a  call,  rendering 
the  dial  tone  detector  ineffective.  Two  other  signals  are  common:  silence  for  an 
indefinite  period,  or  a  pulsed  tone  also  for  an  indefinite  time.  One  solution  to  the 
problem  would  be  to  define  "absolute  silence"  and  devise  a  detector  for  it.  This  would 
handle  the  bulk  of  the  systems  that  don't  revert  to  dial  tone.  In  the  next  major  release 
of  the  STNI  software,  this  function  will  be  incorporated.  Pulsed  tone  detection, 
however,  is  somewhat  more  difficult,  and  may  not  be  necessary.  One  other  safeguard 
is  to  apply  a  timeout  to  any  state  other  than  an  active  conversation.  This  is  really  a 
function  for  the  device  controlling  tne  STNI,  since  it  administers  the  actual  user 
interface 


DTMF  DIALING 

Software  generated  DTMF  signals  are  used  to  dial  out  on  the  telephone  line 
These  tones  are  played  from  prestored  sections  of  ROM.  each  containing  one  or  more 
complete  envelopes  of  modulated  tone  as  a  sequence  of  PCM  samples.  The  dual 
frequencies  were  generated  and  mixed  in  advance  by  a  simple  FORTRAN  program 
and  then  placed  in  the  STNI  ROM.  The  length  of  each  table  was  chosen  by  the 
FORTRAN  program  to  be  a  large  enough  multiple  of  the  wavelength  so  that  the 
resulting  frequency  would  be  within  ±0.5%  of  the  target  frequency.  The  DTMF 
specification  allows  for  an  error  of  ±1.5%.  This  mechanism  is  also  used  to  generate 
industry  standard  dial,  ring  indicate,  and  busy/reorder  tones  for  familiar  user 
feedback  signals.  However,  frequency  tolerances  for  these  tones  are  somewhat 
relaxed  to  save  ROM  space.  Table  1  is  a  list  of  the  frequency  pairs  used  for  each  tone 

Initially,  the  dialer  was  made  nearly  as  fast  as  the  industry  standard  specification 
provides,  roughly  50  msec  on,  50  msec  off.  The  Bell  system  tt  1A  ESS  was  able  to 
handle  full-speed  DTMF  signalling.  Some  systems,  however,  could  not  parse  input 
this  fast.  The  GTE  EAX  was  only  able  to  handle  tones  with  about  a  70  msec  on  and  off 
time.  As  a  result,  the  interdigit  delays  were  increased  to  accommodate  the  slower 
switching  equipment. 


SPEED 

The  actual  implementation  of  the  software  is  not  in  the  C  language,  as  pictured  in 
the  example  in  Figure  4,  but  in  Z80  assembly  language.  Compiled  code  would  not 
have  provided  sufficient  speed  to  handle  the  detection  and  tone  generation  functions 
in  real  time. 

The  Z80  microprocessor  is  clocked  at  3.072  MHz.  from  a  6.144  MHz  crystal  that 
also  provides  clock  signals  for  the  PCM  CODEC  and  other  circuits.  The 
microprocessor  receives  an  interrupt  from  the  CODEC  8000  times  per  second, 
indicating  a  data-ready  condition.  The  interrupt  code  handles  data  transfers,  sound 
detection,  and  stored  tone  generation.  A  total  of  384  instruction  cycles  are  available 
to  the  Z80  between  these  interrupts,  not  all  of  which  may  be  used  by  the  interrupt 
code,  since  there  is  also  a  control  process  that  requires  a  small  percentage  of  the 
available  CPU  time. 
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Table  1 :  Tones  generated  by  the  STNl 


Tone 

Target  frequency 

Actual  freauencv 

"1" 

697Hz  x  1209Hz 

698.4127Hz  x  1206.3492Hz 

"2" 

697Hz  x  1336Hz 

698.4127Hz  x  1333.3333Hz 

"3" 

697Hz  x  1477Hz 

695.6522Hz  x  1478.2609Hz 

"4„ 

770Hz  x  1209Hz 

767.1233Hz  x  1205.4794Hz 

"5" 

770Hz  x  1336Hz 

771,9298Hz  x  1333.3333Hz 

"6" 

770Hz  x  1477Hz 

771.9298Hz  x  1473.6842Hz 

•7" 

852Hzx 1209Hz 

848.4848Hz  x  1212.1212Hz 

"8" 

852Hz  x  1336Hz 

848.4848Hz  x  1333.3333Hz 

"9" 

852Hz  x  1477Hz 

854.3689Hz  x  1475.7281Hz 

«  | 

941Hz x 1209Hz 

941.1765Hz  x  1210.0640Hz 

"0" 

941  Hz  x  1336Hz 

941.1765Hz  x  1333.3333Hz 

"U  " 

941  Hz  x  1477Hz 

941  1765Hz  x  1478.9916Hz 

Dial 

350Hz  x  440Hz 

351.6483Hz  x  439.5604Hz 

Ring 

440Hz  x  480Hz 

436.3636Hz  x  484.8485Hz 

Busy 

480Hz  x  620Hz 

486.9565Hz  x  626.0870Hz 

The  current  implementation  consumes  about  82  percent  of  the  available  time 
when  transferring  PCM  signals,  leaving  only  the  remaining  18  percent  for  the  control 
process  Linear  predictive  coding  (LPC)  transmissions  require  slightly  less  of  the 
processor,  and  tone  generation,  still  less.  It  would  be  possible  to  increase  the 
processor  clock  speed  to  add  further  capabilities,  if  necessary,  although  additional 
circuitry  would  be  required. 

SUMMARY 

A  compact,  intelligent  telephone  interface  has  been  developed,  based  upon  a 
Z80  microprocessor,  providing  a  connection  between  a  commercial  telephone  line 
and  a  Lincoln  Laboratory  Packet  Voice  Terminal  for  use  in  large-scale  packet  voice 
experiments.  Commanded  by  an  ASCII  serial  line,  the  device  implements 
sound/silence  detection,  tone  signal  generation,  and  telephone  line  control, 
providing  an  extremely  flexible  interface  that  allows  calls  to  be  placed  between  the 
commercial  switched  telephone  network  and  the  DARPA  Wideband  Packet  Satellite 
network. 
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